The Potential of Multi-Task Learning in CFDST Design: Load-Bearing Capacity Design with Three MTL Models

Concrete-filled double steel tubes (CFDSTs) are a load-bearing structure of composite materials. By combining concrete and steel pipes in a nested structure, the performance of the column will be greatly improved. The performance of CFDSTs is closely related to their design. However, existing codes for CFDST design often focus on how to verify the reliability of a design, but specific design parameters cannot be directly provided. As a machine learning technique that can simultaneously learn multiple related tasks, multi-task learning (MTL) has great potential in the structural design of CFDSTs. Based on 227 uniaxial compression cases of CFDSTs collected from the literature, this paper utilized three multi-task models (multi-task Lasso, VSTG, and MLS-SVR) separately to provide multiple parameters for CFDST design. To evaluate the accuracy of models, four statistical indicators were adopted (R2, RMSE, RRMSE, and ρ). The experimental results indicated that there was a non-linear relationship among the parameters of CFDSTs. Nevertheless, MLS-SVR was still able to provide an accurate set of design parameters. The coefficient matrices of two linear models, multi-task Lasso and VSTG, revealed the potential connection among CFDST parameters. The latent-task matrix V in VSTG divided the prediction tasks of inner tube diameter, thickness, strength, and concrete strength into three groups. In addition, the limitations of this study and future work are also summarized. This paper provides new ideas for the design of CFDSTs and the study of related codes.


Introduction
Incorporating metal components into concrete is a common approach to reinforce concrete columns [1], and concrete-filled double steel tubes (CFDSTs) have gained considerable traction in recent years.CFDSTs are a load-bearing structure produced by filling concrete between two steel tubes placed concentrically.The unique structure enables CFDST to achieve superior load-bearing capacity while being lighter in weight.Thus, CFDST is widely used in large-scale structures such as super high-rise buildings and highway bridges [2].
As the most fundamental property of CFDSTs, the load-bearing capacity always needs to be considered.In terms of CFDST design, emphasizing on load-bearing capacity, although some international design codes already exist, such as Eurocode-4 (EC4) [13], ACI [14], and AISC [15], their reliability still remains questionable [16].Lama et al. [17] Materials 2024, 17,1994 2 of 21 used a non-linear finite element technique to analyze the axial compression capacity of CFDSTs made from a novel material combination.Comparing the above codes with the numerical results, it was found that EC4 and AISC were not suitable for this type of CFDSTs.Hassanein et al. [18] mentioned that the existing design codes of CFDSTs do not take the confinement effect of tubes into account.This is the reason why most models are too conservative in predicting the load-bearing capacity of CFDSTs.To avoid various constraints that make theoretical approaches more complex, machine learning techniques have been adopted, and they have achieved an accuracy far higher than EC4, ACI, and AISC [16].In other studies, e.g., those of Tran and Kim [19], Wang et al. [4], and Chandramouli et al. [20], the prediction accuracy of the above models (EC4, ACI, and AISC) was also found to be inferior to the recently proposed methods.
However, the core idea of current studies on CFDST design is still to find a mapping from the design parameters to a certain property (e.g., load-bearing capacity) and then to find whether the design is reliable and can be verified via the prediction of this property.In a nutshell, the existing methods are more concerned about "how to verify whether a design is reliable" rather than "how to provide a reliable design directly".Certainly, this is understandable, as the second issue is a more difficult problem involving multiple outputs.
Figure 1 shows the structure of a conventional CFDST.The main structural parameters include the diameters (D) and thicknesses (t) of the steel tubes, as well as column length (H).Among them, the length of the column is often constant in a specific project.Therefore, the design is usually focused on dimensions and properties of steel tubes, as well as the mechanical properties of concrete.Providing diverse guidance for CFDST design is simultaneously a challenge for the aforementioned methods.
Materials 2024, 17, x FOR PEER REVIEW 2 of 23 [13], ACI [14], and AISC [15], their reliability still remains questionable [16].Lama et al. [17] used a non-linear finite element technique to analyze the axial compression capacity of CFDSTs made from a novel material combination.Comparing the above codes with the numerical results, it was found that EC4 and AISC were not suitable for this type of CFDSTs.Hassanein et al. [18] mentioned that the existing design codes of CFDSTs do not take the confinement effect of tubes into account.This is the reason why most models are too conservative in predicting the load-bearing capacity of CFDSTs.To avoid various constraints that make theoretical approaches more complex, machine learning techniques have been adopted, and they have achieved an accuracy far higher than EC4, ACI, and AISC [16].In other studies, e.g., those of Tran and Kim [19], Wang et al. [4], and Chandramouli et al. [20], the prediction accuracy of the above models (EC4, ACI, and AISC) was also found to be inferior to the recently proposed methods.However, the core idea of current studies on CFDST design is still to find a mapping from the design parameters to a certain property (e.g., load-bearing capacity) and then to find whether the design is reliable and can be verified via the prediction of this property.In a nutshell, the existing methods are more concerned about "how to verify whether a design is reliable" rather than "how to provide a reliable design directly".Certainly, this is understandable, as the second issue is a more difficult problem involving multiple outputs.
Figure 1 shows the structure of a conventional CFDST.The main structural parameters include the diameters (D) and thicknesses (t) of the steel tubes, as well as column length (H).Among them, the length of the column is often constant in a specific project.Therefore, the design is usually focused on dimensions and properties of steel tubes, as well as the mechanical properties of concrete.Providing diverse guidance for CFDST design is simultaneously a challenge for the aforementioned methods.As a branch of machine learning technique, multi-task learning (MTL) has been widely used in fields such as stellar spectra parameterization [21], disease cognitive scores [22], and short-term wind speed prediction [23].By summarizing the potential correlations of multiple tasks, MTL has shown good performance in solving various multi-output problems.Therefore, using the MTL technique to guide the structural design of CFDSTs is a promising topic.In addition, using the non-linear finite element As a branch of machine learning technique, multi-task learning (MTL) has been widely used in fields such as stellar spectra parameterization [21], disease cognitive scores [22], and short-term wind speed prediction [23].By summarizing the potential correlations of multiple tasks, MTL has shown good performance in solving various multi-output problems.Therefore, using the MTL technique to guide the structural design of CFDSTs is a promising topic.In addition, using the non-linear finite element technique to verify the reliability of CFDST design is often complex.For slender structures and thin-walled structures, the accuracy requirements for modeling and solving vary at different scales [24,25].The calculation strategy often determines whether the model can quickly converge and accurately predict with limited computational costs on an ordinary computer [26], especially for composite structures [27] like CFDSTs.With the guidance from MTL, the unnecessary trials and errors can be avoided.
From the perspective of load-bearing capacity, this research attempted to utilize three distinct MTL models for the design of CFDSTs.Based on 227 CFDST cases with circular cross-sections collected from previous literature, the MTL models were trained to provide a reliable set of design parameters.These parameters can serve as references to achieve the desired load-bearing performance.Furthermore, several statistical indicators were used to evaluate the reliability of the provided parameters.
The main content of this paper includes the following sections: Section 2 mainly introduces the models used and explains the model development process; Section 3 presents the hyper-parameter settings and experimental results; Section 4 discusses the findings from the results and lists the limitations of the experiment, as well as discussing future work; Section 5 summarizes the content and significance of this study.

Multi-Task Learning Models
In the supervised learning technique, a conventional model can only learn one task during the training phase.For complex problems, they are usually decomposed into several simple problems and then solved separately using single task learning (STL) models.Multitask learning (MTL) is another kind of supervised learning technique that can achieve inductive transfer among multiple tasks.The information-sharing mechanism enables MTL to update the parameters of all tasks in a single data traversal.More importantly, this mechanism can enhance the generalization ability of each task [28].
The design guidance for CFDST requires a model to provide multiple parameters simultaneously.These output parameters are highly correlated, and each of them can correspond to a task in MTL.This characteristic leads to the compatibility between CFDST design and the MTL technique.This section summarizes the three MTL methods used in this study.

Multi-Task Lasso
Lasso regression is a classic linear regression model proposed by Tibshirani [29].By introducing the L 1 norm into the loss function of least squares regression, Lasso regression achieved a better generalization ability than conventional linear regression.The loss function of basic Lasso is given as follows: where Y is the n-dimensional output vector; X is the matrix of input features; W is the coefficient vector of the linear model; ∥•∥ 1 is the L 1 norm of the vector; ∥•∥ 2 is the L 2 norm of the vector; and λ is the adjustment coefficient.
As the L 1 regularization term ∥W∥ 1 is introduced, sparsity constraint is imposed on the coefficient vector; thereby, the coefficients of unimportant input features will be set to 0. This characteristic also allows Lasso to be widely used in feature selection.
Multi-task Lasso is an extended version of Lasso.It was presented by Obozinski et al. [30] in the problem of multi-task feature selection.Multi-task Lasso assumes that multiple similar tasks have correlated feature selection.Introducing the L 2,1 norm to replace the L 1 norm in Lasso, this operation leads to information sharing among different tasks during model iteration.The loss function of multi-task Lasso is given as follows: where vectors Y, X, and W become matrix form; ∥•∥ Fro is the Frobenius norm of matrix; and ∥•∥ 2,1 is the L 2,1 norm of matrix.The mechanism of the L 2,1 norm in MTL is detailed in the work of Liu et al. [31].

VSTG
Variable selection and task grouping (VSTG) is another MTL model developed based on linear regression [32].In VSTG, there are two fundamental theories that need to be understood: low-rank hypothesis and structure approach.As prior knowledge, the low rank hypothesis assumes that the information required for multiple similar tasks is always redundant.For example, if two tasks are more related, their selection of input features may be more similar.Therefore, when all tasks iterate simultaneously, the matrix composed of hyper-parameters must be low rank.In MTL, low-rank constraints can encourage models to develop towards low-rank structures.The structure approach means the association between multiple tasks can be represented by a certain structure.Group structure is the most commonly used type, and it is also adopted by VSTG.In VSTG, similar tasks are considered to belong to the same group.Tasks within the same group share more information during the model development phase.
VSTG decomposes the coefficient matrix, W, into two matrices, U and V.Moreover, latent bases are introduced for learning and describing the overlapping structures among different tasks.The matrix decomposition is shown in Figure 2, where the dark cells represent non-zero values, and light cells represent zero values.[32].In VSTG, there are two fundamental theories that need to be understood: low-rank hypothesis and structure approach.As prior knowledge, the low rank hypothesis assumes that the information required for multiple similar tasks is always redundant.For example, if two tasks are more related, their selection of input features may be more similar.Therefore, when all tasks iterate simultaneously, the matrix composed of hyper-parameters must be low rank.In MTL, low-rank constraints can encourage models to develop towards low-rank structures.The structure approach means the association between multiple tasks can be represented by a certain structure.Group structure is the most commonly used type, and it is also adopted by VSTG.In VSTG, similar tasks are considered to belong to the same group.Tasks within the same group share more information during the model development phase.
VSTG decomposes the coefficient matrix, W, into two matrices, U and V.Moreover, latent bases are introduced for learning and describing the overlapping structures among different tasks.The matrix decomposition is shown in Figure 2, where the dark cells represent non-zero values, and light cells represent zero values.The matrix U is called the variable-latent matrix, and it records the principal components of input features after the dimensionality reduction operation.The matrix V is called the latent-task matrix, and it records the group structure of tasks.For example, the column vectors v1 and v2 are similar in Figure 2, indicating that Task 1 and Task 2 are highly correlated.Thus, these two tasks should belong to the same group.In the training phase, two matrices U and V are updated via ADMM (alternating direction method of multipliers).The loss function of VSTG is given as follows: The matrix U is called the variable-latent matrix, and it records the principal components of input features after the dimensionality reduction operation.The matrix V is called the latent-task matrix, and it records the group structure of tasks.For example, the column vectors v 1 and v 2 are similar in Figure 2, indicating that Task 1 and Task 2 are highly correlated.Thus, these two tasks should belong to the same group.In the training phase, two matrices U and V are updated via ADMM (alternating direction method of multipliers).The loss function of VSTG is given as follows: where → v i is the i-th column vector in matrix V; φ 1 , φ 2 , and θ are the parameters of low-rank constraints; ∥•∥ 1,∞ is the L 1,∞ norm; and ∥•∥ The constraints in Equation ( 3) can also exist in the form of regularization terms.Thus, the problem can be transformed into a regularized objective function, as follows: where λ 1 , λ 2 , and µ are regularization parameters.
2.1.3.MLS-SVR SVR (support vector regression) is a concise and high-performance regression model [33,34].With the maximum margin mechanism, SVR aims to find the best fitting position for linear models in the feature space.The minority vectors that determine the position of margins are called support vectors.Although SVR uses linear models for regression, it can also be applied to some complex nonlinear problems by adopting soft margin and kernel methods [35][36][37][38][39].
MLS-SVR (multi-output least-squares SVR) is an MTL method developed based on SVR [40].Unlike the idea of matrix decomposition in VSTG, MLS-SVR believes that the coefficient vectors of different tasks can evolve from an initial coefficient vector.Rewriting a coefficient vector is the i-th column vector in matrix V; φ1, φ2, and θ are the parameters of low-rank constraints; The constraints in Equation ( 3) can also exist in the form of regularization terms.Thus, the problem can be transformed into a regularized objective function, as follows: where λ1, λ2, and µ are regularization parameters.
2.1.3.MLS-SVR SVR (support vector regression) is a concise and high-performance regression model [33,34].With the maximum margin mechanism, SVR aims to find the best fitting position for linear models in the feature space.The minority vectors that determine the position of margins are called support vectors.Although SVR uses linear models for regression, it can also be applied to some complex nonlinear problems by adopting soft margin and kernel methods [35][36][37][38][39].
MLS-SVR (multi-output least-squares SVR) is an MTL method developed based on SVR [40].Unlike the idea of matrix decomposition in VSTG, MLS-SVR believes that the coefficient vectors of different tasks can evolve from an initial coefficient vector.Rewriting a coefficient vector   In MLS-SVR, the traditional SVR optimization problem has been rewritten into a matrix version, as follows: where → w o represents the initial coefficient vector; W is the coefficient matrix, wherein elements are → w p ; Λ is a matrix composed of → u p ; Θ denotes a matrix composed of slack variables from each task; Ω is a matrix used to map input features to higher dimensional space; → b is the bias vector; p and m are the dimensions of inputs and outputs, respectively; N is the number of samples in training set; and λ and β are regularization parameters.
Table 1 summarizes the 9 parameters required for CFDST design in the database.These parameters are related to the strength of materials (concrete and steel tubes), tube dimensions, and CFDST load-bearing capacity.Additionally, statistical descriptions of the database are given in Table 2.The column length ranging from 230 to 3502 mm included CFDST cases from the laboratory scale to the site scale.In Figure 4, the scatter plot matrix of the database is displayed, and the correlation coefficients between variables are calculated.There, the three parameters, diameter of the outer steel tube, diameter of the inner steel tube, and axial compression capacity of the column were highly correlated.It is worth noting that both D o and D i were positively correlated with N u .However, this makes sense, because a larger column usually means a higher strength.

Model Development
Because of various considerations such as materials and economy, the design of CFDSTs is often customized.In practice, a certain type of steel tube or concrete may be prioritized for use, or tubes with a specific dimension may have to be adopted due to special needs (e.g., outer diameter or self-weight of the column).Therefore, the experiments in this study only demonstrated the scenario of providing guidance on inner steel tube and concrete based on the selected outer steel tube.
The inputs and outputs are shown in Table 3.This section demonstrates the development process of the MTL models based on the example scenario.The similar development pattern can be transferred to other situations that have not been demonstrated.

Model Development
Because of various considerations such as materials and economy, the design of CFDSTs is often customized.In practice, a certain type of steel tube or concrete may be prioritized for use, or tubes with a specific dimension may have to be adopted due to special needs (e.g., outer diameter or self-weight of the column).Therefore, the experiments in this study only demonstrated the scenario of providing guidance on inner steel tube and concrete based on the selected outer steel tube.
The inputs and outputs are shown in Table 3.This section demonstrates the development process of the MTL models based on the example scenario.The similar development pattern can be transferred to other situations that have not been demonstrated.Table 3.The inputs and outputs in the example scenario.

Inputs
Outputs Different magnitudes of variables may lead to inaccuracy for machine learning models.Especially for MTL, multiple tasks cannot be effectively trained if there is a significant difference in the units.Thus, all variables were normalized into [−10, 10].As the values in the generally used normalization [−1, 1] are too small, [−10, 10] can enable the datasensitive linear model to learn suitable coefficients successfully.
All samples were allocated to two datasets, namely, the training set (80%) and the testing set (20%) [61].The training set was used for model development, while the model validation was conducted via a testing set.In supervised learning techniques, there is an assumption that samples and populations follow the same distribution.Therefore, the model development phase requires that the distributions of the training and testing sets are as similar as possible.This partitioning strategy that aims to obtain similar training and testing sets ensures the reliability of model evaluation provided in the testing phase.The distributions of two datasets are shown in Figure 5.
validation was conducted via a testing set.In supervised learning techniques, there is an assumption that samples and populations follow the same distribution.Therefore, the model development phase requires that the distributions of the training and testing sets are as similar as possible.This partitioning strategy that aims to obtain similar training and testing sets ensures the reliability of model evaluation provided in the testing phase.The distributions of two datasets are shown in Figure 5.   Figure 6 shows the entire model development process.Based on the training set, three kinds of MTL models were iterated separately.Subsequently, the testing set was input into these models, and several statistical indicators were adopted to analyze the accuracy of the results.
To evaluate the reliability of the provided parameters, 4 statistical indicators were introduced.Table 4 provides the definitions and expressions of these indicators.The R 2 is generally used to measure the degree of correlation between measurements and predictions [62,63]; RMSE and RRMSE are two indicators used to characterize the error between measurements and predictions [64][65][66][67][68][69]; and ρ is an indicator that comprehensively considers correlation and error [70,71].The larger the ρ value, the less accurate the model.

Statistical Indicators Expressions
Coefficient of determination (R 2 ) Root mean square error (RMSE) Relative root mean squared error (RRMSE) Note: M-measurement value; P-reference value provided by the model.To evaluate the reliability of the provided parameters, 4 statistical indicators were introduced.Table 4 provides the definitions and expressions of these indicators.The R 2 is generally used to measure the degree of correlation between measurements and predictions [62,63]; RMSE and RRMSE are two indicators used to characterize the error between measurements and predictions [64][65][66][67][68][69]; and ρ is an indicator that comprehensively considers correlation and error [70,71].The larger the ρ value, the less accurate the model.

Statistical Indicators Expressions
Coefficient of determination (R 2 )

Results
By adjusting and testing, hyper-parameters of three MTL models were obtained separately.With the hyper-parameter settings given in Table 5, three MTL models completed iterations.For each MTL model, the same development process was repeated five

Results
By adjusting and testing, hyper-parameters of three MTL models were obtained separately.With the hyper-parameter settings given in Table 5, three MTL models completed iterations.For each MTL model, the same development process was repeated five times to mitigate errors caused by randomness.All the experimental results are provided in Appendix A. However, the performance of the two linear models on the tasks f yi and f co was not satisfactory enough.Table 6 shows a set of results that was relatively good in experiments of multi-task Lasso (experiment 1), while Table 7 shows an acceptable set of results in VSTG experiments (experiment 1).Compared to strength tasks (f yi and f co ), multi-task Lasso and VSTG seemed better at the predictions of dimension tasks (D i and t i ).As shown in Tables 6 and 7, the R 2 values of strength tasks were at a low level whether on the training set or the testing set.For task f yi , the R 2 was always less than 0.4, and that means the provided f yi was weakly correlated with the actual value.For task f co , two linear models seemed ineffective even on the training set.Unlike the previous two models, MLS-SVR showed quite good performance in all tasks, and the average result of five experiments is displayed in Table 8.From the perspective of task D i , the RMSE and RRMSE values on the testing set were reduced by 40% compared to multi-task Lasso and VSTG.The RMSE and RRMSE of task t i also decreased by 20%.Moreover, all the ρ values were below 0.1.Figure 7 displays the scatter plots of predictions provided by MLS-SVR (from experiment 4).Even if there were some outliers in the task t i (within [2,4] and [5,6]) and f co (within [40,50]) tasks, the distribution of scattering points still showed the accuracy of MLS-SVR in providing parameters for CFDST design.In Figures 8 and 9, the performances of all MTL models in task f yi and f co are compared.It can be noticed that the predictions of the two linear models were conservative in these two unsatisfactory tasks.In other words, the predicted values of two linear models showed the same trend as the actual values increased or decreased, but the degree was not significant.

Discussion
This section mainly clarifies the problems presented in Section 3, and their causes are also elucidated.Moreover, the feature matrices learned by two linear models are analyzed.Additionally, the limitations and future work are listed at the end of this section.

Nonlinearity
In the tasks of fyi and fco, linear models were unable to effectively learn the task.Overfitting occurred in both multi-task Lasso and VSTG.Based on this fact, there were two possible causes:

•
Cause A: The provided input features did not contain the key information related to the strength tasks.For task fyi and fco, all input features were useless.

•
Cause B: There was a certain non-linear relationship between input features and strength tasks.Thereby, linear models were unable to simulate this nonlinearity well.
Cause A can be ruled out, or rather, it cannot be the main issue.This is because the MLS-SVR (with kernel methods for nonlinearity) was able to achieve very accurate predictions in these two tasks, using the same input features.
To further verify cause B, the scatter plots of strength tasks from multi-task Lasso and VSTG are shown in Figures 10 and 11, respectively.In both task fyi and fco, the direction of scatter distribution (blue lines) tended to be horizontal, and that was evidence of nonlinearity between inputs and tasks.For example, if a linear model was used to fit a concave nonlinear function (as shown in Figure 12a), the distribution of scatter would appear in the pattern shown in Figure 12b.With a constant as the boundary, the predicted values within smaller-scale regions were often overestimated, while the predictions within larger-scale regions were often underestimated.Additionally, the more horizontal scatter distribution also indicated the reason why the predictions of two linear models were more conservative in Figures 8 and 9.

Discussion
This section mainly clarifies the problems presented in Section 3, and their causes are also elucidated.Moreover, the feature matrices learned by two linear models are analyzed.Additionally, the limitations and future work are listed at the end of this section.

Nonlinearity
In the tasks of f yi and f co , linear models were unable to effectively learn the task.Overfitting occurred in both multi-task Lasso and VSTG.Based on this fact, there were two possible causes:

•
Cause A: The provided input features did not contain the key information related to the strength tasks.For task f yi and f co , all input features were useless.

•
Cause B: There was a certain non-linear relationship between input features and strength tasks.Thereby, linear models were unable to simulate this nonlinearity well.
Cause A can be ruled out, or rather, it cannot be the main issue.This is because the MLS-SVR (with kernel methods for nonlinearity) was able to achieve very accurate predictions in these two tasks, using the same input features.
To further verify cause B, the scatter plots of strength tasks from multi-task Lasso and VSTG are shown in Figures 10 and 11, respectively.In both task f yi and f co , the direction of scatter distribution (blue lines) tended to be horizontal, and that was evidence of nonlinearity between inputs and tasks.For example, if a linear model was used to fit a concave nonlinear function (as shown in Figure 12a), the distribution of scatter would appear in the pattern shown in Figure 12b.With a constant as the boundary, the predicted values within smaller-scale regions were often overestimated, while the predictions within larger-scale regions were often underestimated.Additionally, the more horizontal scatter distribution also indicated the reason why the predictions of two linear models were more conservative in Figures 8 and 9.

Model Interpretability
Although the two linear models did not excel in all tasks, they had better interpretability.The coefficient matrices (W) obtained from two linear models, as well as variable-latent matrix (U) and latent-task matrix (V) provided by VSTG, all as interpretable components contained the information that revealed the potential correlations among CFDST parameters.Two coefficient matrices from multi-task Lasso and VSTG are shown in Figure 13.Firstly, the load-bearing capacity (axial compression capacity, N u ) was highly correlated with each output feature, with the strength of concrete contributing the most.Secondly, the contributions of H and f yo to task D i and task t i were close to 0. That means the influence of CFDST length and outer tube strength on the inner tube dimension was minimal.
ence of CFDST length and outer tube strength on the inner tube dimension was minimal.
For tasks fyi and fco, there was almost no obvious sparsity in the coefficient vectors (except for the coefficient of H in task fyi in multi-task Lasso).This phenomenon indicated that all inputs were indispensable for both tasks.However, these two models were still unable to fit tasks fyi and fco well.Therefore, it is reasonable to infer that there must have been a lack of certain features, or that potential information had not been fully explored.These inferences were mentioned and answered in Section 4.1.Nevertheless, it is noteworthy that the coefficient vector of task fyi indeed had a relatively stronger sparsity (smaller coefficients) than task fco's.Perhaps this was the reason why the scatter distribution of task fyi in Figures 10 and 11 was more concentrated, as the dimension of input feature space of task fyi was compressed more effectively.The variable-latent matrix U and latent-task matrix V from VSTG are shown in Figure 14.The matrix U recorded the mapping from the original input space to the lower dimensional feature space.Figure 14a shows that the input features Do and fyo were able to be expressed using only two bases in the new feature space, and the projection of feature H on the third base M3 was also almost 0 (Figure 14a).Excess information from the original five-dimensional input space was still able to be fully expressed after being compressed into three-dimensional space.The matrix V is a coefficient matrix used to map the new feature space to the output space.This matrix also stores the group structure among multiple tasks.Seemingly, these tasks can be divided into three groups: • Group 1 (task Di): The coefficient of the second base M2 is significantly higher than the other two; For tasks f yi and f co , there was almost no obvious sparsity in the coefficient vectors (except for the coefficient of H in task f yi in multi-task Lasso).This phenomenon indicated that all inputs were indispensable for both tasks.However, these two models were still unable to fit tasks f yi and f co well.Therefore, it is reasonable to infer that there must have been a lack of certain features, or that potential information had not been fully explored.These inferences were mentioned and answered in Section 4.1.Nevertheless, it is noteworthy that the coefficient vector of task f yi indeed had a relatively stronger sparsity (smaller coefficients) than task f co 's.Perhaps this was the reason why the scatter distribution of task f yi in Figures 10 and 11 was more concentrated, as the dimension of input feature space of task f yi was compressed more effectively.
The variable-latent matrix U and latent-task matrix V from VSTG are shown in Figure 14.The matrix U recorded the mapping from the original input space to the lower dimensional feature space.Figure 14a shows that the input features D o and f yo were able to be expressed using only two bases in the new feature space, and the projection of feature H on the third base M3 was also almost 0 (Figure 14a).Excess information from the original five-dimensional input space was still able to be fully expressed after being compressed into three-dimensional space.The matrix V is a coefficient matrix used to map the new feature space to the output space.This matrix also stores the group structure among multiple tasks.Seemingly, these tasks can be divided into three groups: The coefficient of the second base M2 is significantly higher than the other two; • Group 2 (task t i ): The coefficient of the second base M2 is significantly lower than the other two; • Group 3 (task f yi and f co ): The first base is obviously important, while the values of the other two are almost 0.
This indicates that the diameter and thickness of the inner steel tube are two completely different tasks (their feature selections are opposite), while the strength of the inner steel tube and concrete are very similar tasks.This indicates that the diameter and thickness of the inner steel tube are two completely different tasks (their feature selections are opposite), while the strength of the inner steel tube and concrete are very similar tasks.

Limitations and Future Work
Although the correlation of models with experimental results demonstrated the potential applications of MTL techniques in guiding CFDST design, there are still many limitations:

•
The samples collected in the database were all CFDSTs with circular cross-sections, as this shape is the most conventional.CFDSTs of other shapes may have more parameters, so their design must be more complex.

•
MTL is a data-driven approach, and the performance of this technique depends on the quantity and quality of data.Currently, the uniaxial compression cases of CFDSTs are sufficient, while cases of other property trials are still lacking.

•
The interpretability of linear models can reveal the potential connection among CFDST parameters.However, this connection is purely mathematical, and the mechanism behind it still remains a mystery.
This study was conducted in a specific condition, and further validations of "whether MTL techniques would be applicable in other situations of CFDST design" will be necessary.As an auxiliary tool, the linear MTL models own good interpretability, and that allows MTL to play a role in the future development of CFDST design codes.In addition, the mechanistic research on CFDSTs (load bearing, failure, etc.) will continue to be crucial.Experiments and cases on the properties of CFDSTs are still lacking.

Conclusions
Due to the fact that previous methods can only verify the reliability of a CFDST design and cannot provide direct parameter guidance, this paper proposed using the MTL technique to guide CFDST design.The main works and findings are as follows:

•
With 227 uniaxial compression cases of CFDSTs collected from previous literature, three kinds of MTL models were trained to provide multiple parameters for CFDST

Limitations and Future Work
Although the correlation of models with experimental results demonstrated the potential applications of MTL techniques in guiding CFDST design, there are still many limitations: • The samples collected in the database were all CFDSTs with circular cross-sections, as this shape is the most conventional.CFDSTs of other shapes may have more parameters, so their design must be more complex.

•
MTL is a data-driven approach, and the performance of this technique depends on the quantity and quality of data.Currently, the uniaxial compression cases of CFDSTs are sufficient, while cases of other property trials are still lacking.• The interpretability of linear models can reveal the potential connection among CFDST parameters.However, this connection is purely mathematical, and the mechanism behind it still remains a mystery.
This study was conducted in a specific condition, and further validations of "whether MTL techniques would be applicable in other situations of CFDST design" will be necessary.As an auxiliary tool, the linear MTL models own good interpretability, and that allows MTL to play a role in the future development of CFDST design codes.In addition, the mechanistic research on CFDSTs (load bearing, failure, etc.) will continue to be crucial.Experiments and cases on the properties of CFDSTs are still lacking.

Conclusions
Due to the fact that previous methods can only verify the reliability of a CFDST design and cannot provide direct parameter guidance, this paper proposed using the MTL technique to guide CFDST design.The main works and findings are as follows: • With 227 uniaxial compression cases of CFDSTs collected from previous literature, three kinds of MTL models were trained to provide multiple parameters for CFDST design.Based on a specific application scenario, the development process of the MTL models was demonstrated.

•
During the testing phase, MLS-SVR was able to accurately provide reliable CFDST parameters, while the other two linear models, multi-task Lasso and VSTG, were unable to provide valuable parameters of inner steel tube strength and concrete strength.• The distribution of scattered points reflected the potential nonlinearity in the task f yi and f co , and the connotation in scatter distribution was discussed in detail.Furthermore, the coefficient matrices of two linear models and the potential group structure among the CFDST parameters were clarified.

•
At the end of Section 4, the limitations of the study and future work are also summarized.
In conclusion, the MTL technique has great potential in guiding CFDST design.With a set of directly provided parameters, the workload of engineers in CFDST design will be greatly reduced.Due to the interpretability, linear MTL models can also serve as an analytical tool and assist in the study of the property mechanisms and design standards of CFDSTs.

→
w p into → w o + → u p , the MTL version SVR is dedicated to learning the initial vector → w o and evolved component → u p from the dataset.Figure 3 shows the illustration of MLS-SVR's intuition. θ , the MTL version SVR is dedicated to learning the initial vector o w  and evolved component p u  from the dataset.Figure 3 shows the illustration of MLS-SVR's intuition.

Figure 3 .
Figure 3. Evolution process of coefficient vectors in MLS-SVR.Figure 3. Evolution process of coefficient vectors in MLS-SVR.

Figure 3 .
Figure 3. Evolution process of coefficient vectors in MLS-SVR.Figure 3. Evolution process of coefficient vectors in MLS-SVR.

Figure 4 .
Figure 4. Scatter plot matrix and correlation coefficients of the database (Dots represent the samples, while lines display the distribution trend of dots).

Figure 4 .
Figure 4. Scatter plot matrix and correlation coefficients of the database (Dots represent the samples, while lines display the distribution trend of dots).

Figure 5 .
Figure 5. Dataset division and sample distributions in the training and testing sets.(a) division of parameters H, Do, fco, fyo and fyi; (b) division of parameters Nu, to, Di and ti.

Figure 6
Figure6shows the entire model development process.Based on the training set, three kinds of MTL models were iterated separately.Subsequently, the testing set was input into these models, and several statistical indicators were adopted to analyze the accuracy of the results.

Figure 5 .
Figure 5. Dataset division and sample distributions in the training and testing sets.(a) division of parameters H, D o , f co , f yo and f yi ; (b) division of parameters N u , t o , D i and t i .
-measurement value; P-reference value provided by the model.

Figure 8 .
Figure 8. Illustrations of the comparison among MTL provided and the actual values on task fyi: (a) training set; (b) testing set.

Figure 7 . 23 Figure 7 .
Figure 7. Scatter plots of the provided parameters by MLS-SVR: (a) D i plot; (b) t i plot; (c) f yi plot; (d) f co plot.

Figure 8 .
Figure 8. Illustrations of the comparison among MTL provided and the actual values on task fyi: (a) training set; (b) testing set.

Figure 8 .
Figure 8. Illustrations of the comparison among MTL provided and the actual values on task f yi : (a) training set; (b) testing set.

Figure 9 .
Figure 9. Illustrations of the comparison among MTL provided and actual values on task fco: (a) training set; (b) testing set.

Figure 9 .
Figure 9. Illustrations of the comparison among MTL provided and actual values on task f co : (a) training set; (b) testing set.

Figure 12 .
Figure 12.The impact of nonlinearity on linear models: (a) a concave nonlinear function; (b) misguided scatter distribution.

Figure 12 .
Figure 12.The impact of nonlinearity on linear models: (a) a concave nonlinear function; (b) misguided scatter distribution.

Figure 12 .
Figure 12.The impact of nonlinearity on linear models: (a) a concave nonlinear function; (b) misguided scatter distribution.Figure 12.The impact of nonlinearity on linear models: (a) a concave nonlinear function; (b) misguided scatter distribution.

Figure 12 .
Figure 12.The impact of nonlinearity on linear models: (a) a concave nonlinear function; (b) misguided scatter distribution.Figure 12.The impact of nonlinearity on linear models: (a) a concave nonlinear function; (b) misguided scatter distribution.

Figure 13 .
Figure 13.The coefficient matrices obtained from two linear models: (a) W of multi-task Lasso; (b) W of VSTG.

Figure 13 .
Figure 13.The coefficient matrices obtained from two linear models: (a) W of multi-task Lasso; (b) W of VSTG.
the other two; • Group 3 (task fyi and fco): The first base is obviously important, while the values of the other two are almost 0.

Table 1 .
Parameter definitions and descriptions.

Table 2 .
Statistical description of parameters in the database.

Table 4 .
Statistical indicators for evaluating the reliability of the provided parameters.

Table 4 .
Statistical indicators for evaluating the reliability of the provided parameters.

Table 5 .
The hyper-parameter settings used for the experiments.

Table 6 .
Statistical evaluation of multi-task Lasso.

Table A2 .
All experimental results from VSTG.

Table A3 .
All experimental results from MLS-SVR.